Methods for producing modified nucleic acid molecules

ABSTRACT

A method for inserting a cassette into a nucleic acid molecule to produce a modified nucleic acid molecule, e.g., a nucleic acid-cassette fusion, is described. The method involves the use of polymerase chain reaction, thereby permitting precise targeting of the desired site in the molecule, but does not require ligation. Also described is the use of this method for high throughput screening the resulting modified nucleic acid molecules.

This is a 371 of International Application No. PCT/US00/12103, filed 04May 2000, which claims benefit from the following ProvisionalApplication No. 60/133,391, filed 10 May 1999.

FIELD OF THE INVENTION

The invention relates generally to methods for generating modifiednucleic acid molecules, and more particularly, to the use of polymerasechain reaction to generate gene knockouts and nucleic acid fusionmolecules.

BACKGROUND OF THE INVENTION

There are a variety of reasons which make the modification of nucleicacid sequences, particularly genes, desirable. The classical strategyfor gene disruption requires the isolation of a gene and digestion withrestriction enzymes [R. Rothstein, Methods Enzymol., 101:202-211(1983)]. However, the use of restriction enzymes to digest the DNAfragments sometimes makes it difficult to construct appropriatefragments disrupted by a marker DNA. To overcome this problem, severalmethods utilizing polymerase chain reaction (PCR) for constructing suchdeletions have been developed. However, it is still necessary to isolatethe DNA fragment of interest, or to use a variety of complex steps [D. CAmberg et al, Yeast, 11:1275-1280 (1995); A. Wach etal, Yeast,10:1793-1808 (1994); A. Wach, Yeast, 12:259-265 (1996)].

One recently described method describes a strategy for constructing genedisruption cassettes by means of PCR and ligation. See, J. Nikawa and M.Kawabat, Nucleic Acids Res., 26(3):860-861 (1998). In a first step, twoseparate regions of a target gene are PCR amplified with primersspecific for the target sequence and genornic DNA as a template.Secondly, the two PCR products are ligated with a DNA fragment of amarker gene through two separate reactions. The ligated fragments arethen PCR amplified separately. Following amplification the PCR amplifiedfragments are mixed, denatured, annealed, and extended with DNApolymerase. Finally, the product is PCR amplified with the outermostprimers.

Despite these recent advances, there remains a need for methods formodifying nucleic acid molecules which are more efficient, yet permitprecise engineering at the target site.

SUMMARY OF THE INVENTION

The method of the invention provides a simple method for preciselygenerating a modified nucleic acid molecule to contain a deletion and/oran insertion. Advantageously, this method does not require ligation andis well suited for use in automated formats, including high throughputformats.

In one aspect the invention provides a three-stage method for insertinga cassette into a nucleic acid molecule to produce a modified nucleicacid molecule fusion without requiring ligation. In the first stage, themethod involves amplifying two separate regions of a selected nucleicacid molecule and a cassette. The two regions of the nucleic acidmolecule have nucleotide sequences flanking a site in the moleculetargeted for disruption, whereby the amplification produces a firstamplification product of nucleotide sequences upstream of the targetsite and a second amplification product of nucleotide sequencesdownstream of the target site. The cassette has sequences at its 5′ and3′ ends which overlap with sequences of the two regions of the nucleicacid molecule. In the second stage, the amplified cassette product ismixed separately with the first or second amplification productsresulting from amplification of the nucleic acid molecule. The cassetteand first amplification product are amplified by PCR, thereby forming afirst fusion product consisting of the first amplification product fusedto the 5′ end of the first strand of the cassette. The cassette is alsomixed with the second amplification product and this mixture isamplified to form a second fusion product consisting of the secondamplification product fused to the 3′ end of the first strand of thecassette. In a third stage, the first and second fusion products aremixed and amplified by PCR, thereby producing a modified nucleic acidmolecule comprising the cassette in the target site of the selectednucleic acid molecule. Desirably, the resulting modified nucleic acidmolecule is amplified via polymerase chain reaction.

In another aspect, the invention provides a novel method for performingamplifying selected sequences by PCR, which is particularly well suitedfor use in the stage three of the method of the invention. In thismethod, a mixture containing the fusion products prepared according tostage 2 of the method of the invention is heated for about 5 minutes inthe absence of polymerase or primers at about 94° C., cooled to 50° C.over about 30 minutes, at which temperature it is maintained for about 5minutes or longer. A thermostable polymerase is then added to themixture, which is heated to about 72° C. for about 5 minutes, and mixedwith a forward primer P1 for the first region and a reverse primer P4for the second region. The resulting mixture is then amplified using PCRto produce a modified nucleic acid molecule comprising the first andsecond regions of the nucleic acid sequence flanking the cassette.

In a further aspect, the invention provides a two stage method ofproducing a modified nucleic acid moleucle without ligation. The methodinvolves producing two separate regions of a nucleic acid molecule and acassette as in stage I of the three stage method of the invention.Thereafter, the three products are mixed and subjected to amplificationby PCR, as described in the aspect above. Thus, this embodiment of theinvention permits elimination of stage 2 of the three-stage method.

In yet a further aspect, the invention provides modified nucleic acidsequences produced using the method of the invention.

In yet another aspect, the present invention provides a method of highthroughput preparation of disrupted Streptococcus DNA sequences withoutligation. This method involves mixing (a) a nucleic acid moleculecomprising Streptococcus DNA sequences comprising a first regionupstream of a site in the Streptococcus DNA targeted for disruption anda second region downstream of the target site, said first and secondregion each having a first and second end, (b) a cassette comprising atone end, nucleotide sequences which overlap with nucleotides at thesecond end of the first region, and at its other end, nucleotides whichoverlap with nucleotides of the first end of the second region, and (c)primers for the first and second regions in each of the wells of a platecontaining a plurality of reaction wells. This mixture is then subjectedto PCR, thereby amplifying the first and second regions of the selectedStreptococcus DNA sequences. The cassette and the amplified first andsecond regions of the Streptococcus DNA sequences are then mixed andsubjected to polymerase chain reaction to produce a nucleic acid fusionmolecule comprising the first and second regions of the StreptococcusDNA sequence flanking the cassette.

Other aspects and advantages of the invention will be readily apparentfrom the detailed description of the invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a novel method for rapidly generatingmodified nucleic acid molecules, wherein the modification involvesinsertion of a cassette and/or deletion of desired sequences. Thismethod eliminates the ligation steps required in known methods forgenerating knock-out genes, and permits precise targeting of the site inthe nucleic acid molecule for insertion of a cassette or deletion ofsequences. Further, the method of the invention is readily adapted foruse in high throughput screening.

Thus, the invention provides a method for modifying a nucleic acidmolecule at a predetermined target site by insertion and/or deletion ofnucleic acid sequences in the absence of ligation. The method of theinvention may be used to make unmarked deletions by using primers havingan overlap region. More desirably, the method of the invention is usedin the production of a modified nucleic acid molecule which is a nucleicacid-cassette fusion. Optionally, this nucleic acid-cassette fusion maybe a knock-out construct.

A knock-out construct refers to a modified nucleic acid molecule inwhich the function of a selected gene in the molecule has beendisrupted, either by its deletion (either partial of fully) or by theinsertion of a cassette which eliminates its function. In certaininstances, a knock-out construct may have both a deletion and aninserted cassette.

As used herein, a nucleic acid molecule is composed of nucleotidesequences of RNA or DNA. The RNA or DNA may be double- or singlestranded and may be readily selected from the different subtypes of RNA(e.g. mRNA or tRNA) or DNA (e.g., genornic, chromosomal, or cDNA).Optionally, the nucleotides of these molecules may containmodifications, e.g., labels which are known in the art, methylation,“cap”, substitution of one or more of the naturally occurringnucleotides with an analog, and internucleotide modifications. Aparticular advantage of the method of the invention is that it canreadily be applied to nucleic acid molecules regardless of whether theyare linear or circular, e.g., plasmids. The nucleic acids used in themethod of the invention may be obtained from any suitable source,including, for example, viruses, plasmids, yeast, gram positive and gramnegative bacterial, eukaryotic cells, and the like. Currently, preferredsources of bacterial nucleic acids include the gram-positiveStreptococcus and Staphylococcus and the gram negative Haemophilusinfluenzae. However, selected of the nucleic acid molecules is not alimitation of the present invention.

A target site is a location within a nucleic acid molecule or sequenceinto which a cassette is to be inserted or from which sequences are tobe deleted. Suitably, a target site may be composed of two nucleotides,between which an insertion is to be made, or a group of nucleotides,e.g, from two to ten bases in length which are to be deleted and/or intowhich the cassette is to be inserted. In certain embodiments (e.g.,where deletions are to be made), the target site may be larger than 50bases. In these embodiments, the target site may range from 50 bp to5000 bp, 500 bp to 3000 bp, 1000 bp to 2500 bp, or other suitable sizeswithin these ranges. It is not essential that the target site be acoding sequence. In one embodiment, the target sequence may be selectedparticularly for use in essentiality testing or expression studies.

As used herein, a “cassette” is a nucleic acid sequence targeted forinsertion into the target site of nucleic acid molecule and/or forfusion with two regions of the nucleic acid molecule. Such cassettes maybe composed of single or double-stranded sequences, and may be linear orcircular. While the size of a cassette useful in the invention is not alimitation, it is generally at least 10 nucleotides in length and aslarge as about 5000 nucleotides in length. Preferably, the nucleic acidsequence is a DNA sequence which performs some function. For example,the cassette may be readily selected from among known marker genes,including, e.g., antibiotic resistance genes (e.g., erythromycin,tetracylines, and chloramphenicols), reporter genes including thosewhich are colorimetrically detectable, regulatory sequences includingpromoters, terminators, operators, and the like, and other functionalDNA sequences, e.g., sequences encoding therapeutic or antigenicproteins. Alternatively, the cassette may be an oligonucleotide whichintroduces one or more base pair changes into the nucleic acid moleculeto produce a desired effect in the resulting modified nucleic acidmolecule. In still another alternative, the cassette may simply be anon-functional DNA sequence which is inserted to interrupt translationand expression of a protein encoded by a sequence located downstream ofthe target site. The cassettes used in the invention are engineered tocontain sequences at the 5′ and 3′ end which overlap with (i.e., areidentical to) sequences of the regions of the nucleic acid moleculeflanking the target site. Thus, a cassette of the invention composed ofdouble-stranded DNA would have a first strand with, at its 5′ end,nucleotide sequences which overlap with nucleotides of a stand of theupstream region and, at its 3′ end, nucleotide sequences which overlapwith nucleotides of a strand of the downstream region. The region ofoverlap in sequences is between about 10 nt to about 50 nt in length,and preferably about 15 nt to about 35 nt, and most preferably about 20nt in length. The cassettes useful in the invention may be readilyobtained by a variety of convention methods, including geneticengineering methods and chemical synthesis.

As used herein the term “upstream region” refers to those sequences ofnucleic acid which are located 5′ to the target site, with reference tothe coding strand of the nucleic acid molecule. However, the upstreamregion need not be composed of sequences which encode a desired protein,peptide or other gene product. Where the target site is located withinan open reading frame (ORF), the upstream region preferably containssequences flanking the targeted ORF. Suitably, where the modifiednucleic acid molecule is to be a knock-out construct, the upstreamregion contains sufficient homology to mediate homologous recombinationbetween the modified nucleic acid molecule and the non-disrupted gene ina host cell into which the modified nucleic acid molecule istransformed. Generally, a length of about 100 nt to about 1000 nt, andpreferably, at least about 500 nt, of homologous sequences is consideredsufficient. Preferably, these “homologous sequences” contain exact(i.e., 100%) identity of sequences. However, the “homologous sequences”may contain some degree of nonidentity. Where there is some degree ofnon-identity, the sequences suitably have at least 95% identity, morepreferably 97% identity, and most preferably 98-99% identity. In otherembodiments, particularly where homologous recombination is not desiredfollowing transformation of a host cell, the size of the upstream regionmay be readily determined by one of skill in the art. For example, theupstream region may be as small as about 100 bp and as large as 500 kb,or more.

The term “downstream region” refers to those sequences of nucleic acidswhich are located 3′ to the target site, with reference to the codingstrand of the nucleic acid molecule. As with the upstream region, thedownstream region need not be composed of coding sequences; and, wherethe target site is located within an ORF, the downstream regionpreferably contains sequences flanking the targeted ORF. Suitably, thesize of the downstream region is determined by the factors describedabove with respect to the upstream region. However, it will beunderstood that the sizes of the downstream region and upstream regionmay be selected independently of one another.

It should be noted that although the discussion refers in many locationsto double-stranded DNA for purposes of convenience, it will understandthat the method of the invention is useful with single-stranded nucleicacid sequences. Further, it will be recognized that even in situationswhere the nucleic acid molecule and the insertion cassette aredouble-stranded, single-stranded DNA may be added to the PCR mixture foruse in obtaining the desired amplification product(s).

As known in the art, “homology” or “identity” means the degree ofsequence relatedness between two polypeptide or two polynucleotidesequences as determined by the identity of the match between two lengthsof such sequences. Both identity and homology can be readily calculatedby methods extant in the prior art [See, e.g., COMPUTATIONAL MOLECULARBIOLOGY, Lesk, A. M., ed., Oxford University Press, New York, (1988);BIOCOMPUTING: INFORMATICS AND GENOME PROJECTS, Smith. D. W., ed.,Academic Press, New York, (1993); COMPUTER ANALYSIS OF SEQUENCE DATA,PART I, Griffin, A. M., and Griffin, H. G., eds., Humana Press, NewJersey, (1994); SEQUENCE ANALYSIS IN MOLECULAR BIOLOGY, von Heinje, G.,Academic Press, (1987). and SEQUENCE ANALYSIS PRIMER, Gribskov, M. andDevereux, J., eds., M Stockton Press, New York, (1991)]. While thereexist a number of methods to measure identity and homology between twopolynucleotide sequences, the terms “identity”, “similarity” andhomology are well known to skilled artisans [H. Carillo and D. Lipton,SIAM J. Applied Math., 48:1073 (1988)]. Methods commonly employed todetermine identity or homology between two sequences include, but arenot limited to, those disclosed in Guide to Huge Computers, Martin J.Bishop, ed., Academic Press, San Diego, 1994, and H. Carillo and D.Lipton, SIAM J. Applied Math., 48:1073 (1988). Preferred methods todetermine identity or homology are designed to give the largest matchbetween the two sequences tested. Methods to determine identity andsimilarity are codified in computer programs. Preferred computer programmethods to determine identity and homology between two sequencesinclude, but are not limited to, the algorithm BESTFIT from the GCGprogram package [J. Devereux et al., Nucl. Acids Res., 12(1):387(1984)], the related MACVECTOR program (Oxford), and the FASTA (Pearson)programs, which may be used at default settings or modified settingssuch as determined to be suitable by one of skill in the art.

I. Three Stage PCR

It will be readily recognized by one of skill in the art that themethods of the invention may be readily performed using a highthroughput format, i.e., the reactions may be performed on a platecontaining a multiplicity of reaction chambers, e.g., 96, 384, or 1536wells. Such plates are readily available from a variety of sources.However, the reactions may similarly be performed in a variety of othersuitable reaction vessels, e.g., tubes. Selection of the reaction vesselis not a limitation of the present invention.

Following selection of the target site within the nucleic acid molecule,primers are obtained for specifically amplifying the upstream anddownstream regions flanking the target site. Such primers may be readilygenerated, e.g., by chemical synthesis or other suitable means, based onthe knowledge of the sequences of the nucleic acid molecule,particularly in the area of the target site. Currently, it is preferablefor all of the primers described for use in the method of the inventionto contain 40-50% G+C content to facilitate the PCR reaction. However,it is possible to vary the G+C content within the range of 35% to about55% of the primer. In one particularly desirable embodiment, a set oftwo primers is generated for each of the two regions flanking the targetsite.

Thus, a forward primer, P1, and a reverse primer, P2, for the regionupstream of the target site are obtained from commercial sources orgenerated using conventional techniques. P1 is about 20 nt to about 30nt, and more preferably, about 20 nt in length. Optionally, this primermay contain restriction sites for use in molecular cloning aftergeneration of the final fusion product. The P1 primer is complementaryto the 5′ end of a first strand of the region of the nucleic acidmolecule upstream of the target site. Most preferably, the P1 primertargets the extreme 5′ end of the first strand. However, it may targetsequences very distal (e.g., several nucleotides from the extreme 5′end) or sequences that use some of the actual coding region forhomology, provided that a sufficient portion of the target site isdeleted or disrupted to inactivate its function. P2 is about 30 nt toabout 50 nt, and more preferably, about 40 nt in length. In addition tocontaining reverse sequences of complementarity for the region upstreamof the target site, this primer is designed to contain a tail withreverse complementarity to the 5′ end of the cassette. This tail isabout 20 nt to 30 nt, and preferably 20 nt in length. Generally, the P2primer is complementary to the 5′ end of a second strand (having reversecomplementarity to the first strand) immediately upstream (i.e., at thenext nt base) of the target site. (Desirably, where coding sequences aretargeted, the first strand may be a sense strand and the second strandmay be an anti-sense strand.)

Similarly, a forward primer, P3 and a reverse primer, P4, are obtainedfor the region downstream of the target site. P3 contains a nt tailwhich has a region of forward polarity to the 3′ end of the cassette(i.e., is complementary) and forward sequences for homology to thesequences down-stream of the target site. Suitably, the P3 primer iscomplementary to the 5′ end of a first strand immediately upstream ofthe target site. The tail of P3 may be about 20 nt to 30 nt, andpreferably 20 nt in length. P4 is about 20 nt to about 30 nt, and morepreferably, about 20 nt in length. Optionally, this primer may containrestriction sites for use in molecular cloning after generation of themodified nucleic acid molecule which is the final fusion product. P4amplifies the sequences at the 5′ end of the second strand (havingreverse complementarity to the first strand) of the downstream region ofthe nucleic acid molecule. Most preferably, the P4 primer targets theextreme 5′ end of the second strand. However, it may target sequencesvery distal (e.g., several nucleotides from the extreme 5′ end) orsequences that use some of the actual coding region for homology,provided that a sufficient portion of the target site is deleted ordisrupted to inactivate its function.

In certain situations, it may be desired not only to introduce acassette into a target site, but also to delete sequences from thenucleic acid molecule in order to do so. In such situations, the targetsite is a short sequence as defined above rather than a location betweentwo nucleotide bases and the primers are designed to amplify the regionsupstream and downstream of the sequences of the target site. Similarly,the cassette is designed to contain sequences overlapping with thenucleotide bases flanking either side of the target sequence. Thus,performance of the method steps described herein will result in amodified nucleic acid molecule containing an upstream region fused tothe cassette which is fused to a downstream region, and furthercontaining a deletion of the target sequences of the nucleic acidmolecule.

A. Stage 1 Amplification

In one desired embodiment, two separate regions of the nucleic acidmolecule flanking the target site are produced using the P1/P2 for theupstream homology and P3/P4 for the downstream homology. Thisamplification is performed using PCR.

The PCR steps performed in the method of the invention are performedwith a thermostable DNA or RNA polymerase and a polymerase having 3′-5′exonuclease activity to remove non-template bases at the 3′ and 5′ ends.For example, an example of a particularly suitable thermostable DNApolymerase is Taq DNA polymerase. The native enzyme may be purified fromThermus aquiticus or genetically engineered from,the enzyme may besynthesized or obtained from a commercial source (e.g., as AMPLITA Q®).Taq is particularly desirable because it carries 5′polymerization-dependent exonuclease activity. Thus, if this polymeraseis selected, it is only necessary to include in the reaction mixture aproof-reading polymerase with 3′ exonuclease activity. Suitably, highfidelity polymerases are also desirable because they possess 3′ and/or5′ exonuclease activity. Examples of high fidelity polymerases includePfu (has 3′ proof-reading activity), Pwu (has 5′ proof-readingactivity), Vent, Deep Vent, Hot Tub, Tfl, and Thr polymerases. However,other suitable polymerases may be selected and obtained from a varietyof commercial sources (e.g., Stratagene). Alternatively, other DNApolymerases may be readily selected and 5′ and/or 3′ exonucleases addedif these functions are not provided by the selected polymerase. Suchpolymerases and exonucleases may be readily selected by one of skill inthe art and obtained from a variety of sources. Reaction conditions areas specified by the enzyme supplier with extension times adjusted forthe expected product size. See, also, e.g., Sambrook et al, “MolecularCloning: A Laboratory Manual”, 2^(nd) Ed., Cold Spring Harbor Press,Cold Spring Harbor, N.Y. (1989), ch. 14.2-14.4 for a general discussionof suitable PCR reagents, buffers, and conditions.

The upstream and downstream regions of the nucleic acid molecule may begenerated in a single reaction, or in separate reactions, as desired.Desirably, the final products are purified to homogeneity. Thispurification can be performed using conventional techniques, includingspin dialysis performed in microconcentrators or polyacrylamide oragarose gel electrophoresis. See, Sambrook et al, cited above. Anexample of a suitable system which is commercially available includesQiagen's affinity matrix purfication systems. However, othercommercially available systems may be readily selected.

Suitably, the cassette is also amplified via PCR using a proof-readingpolymerase as described above for the upstream and downstream regions.The forward primer, R1, and reverse primer, R2, for the cassette areobtained using conventional techniques such as those described above.These primers are generally about 20 to about 30 nt in length.Optionally, the cassette may be amplified in a reaction which alsocontains the upstream region and downstream region. Alternatively, thecassette is amplified in separate reactions with the upstream region orthe downstream region. In yet another alternative, the cassette isamplified prior to mixture with either the upstream or downstreamregion. Following amplification, the final product is purified tohomogeneity as described herein.

B. Stage 2 Amplification

The product resulting from amplification of the cassette is mixed withan approximately equivalent amount of the product of the amplificationof the upstream region or the product of the amplification of thedownstream region. For a typical PCR reaction, the amount of eachamplification product mixed is about 0.1 μg. However, these amounts maybe adjusted, e.g, from as low as 0.05 μg to as high as about 0.5 μg toabout 1.0 μg. In these separate PCR reactions using proof-readingpolymerases as described above for stage I amplification, theappropriate primers are used. More particularly, for the mixturecontaining the upstream region and the cassette, primers P1 and R2 areused. The resulting product is a fusion product having the 3′ end of afirst strand of the upstream region fused to the 5′ end of the firststrand of the cassette. For the mixture containing the cassette and thedownstream region, primers R1 and P4 are used. The resulting fusionproduct is the 3′ end of the first strand of the cassette fused to the5′ end of the first strand of the downstream region. Preferably, theresulting fusion products are purified to homogeneity.

C. Stage 3 Amplification

The purified fusion products generated as described above are then mixedand subjected to PCR in order to generate a modified nucleic acidmolecule which contains the cassette in the target site of the selectednucleic acid molecule, flanked by the upstream region and the downstreamregion. While this PCR may be performed as above, using convention PCRsteps with a proof-reading polymerase, it has been found that amodification to these standard techniques provides better yield.

Thus, the third amplification stage involves the following procedure. Amixture containing the products to be amplified, e.g., the two fusionproducts obtained from stage 2 amplification, is heated in the absenceof polymerase or primers. Suitably, this may be performed in a standardbuffering solution, e.g., 50 mM KCl, 10 mM Tris.Cl and 1.5 mM MgCl₂. Theheating step is performed for about 2 to about 8 minutes, preferablyabout 5 minutes, to a temperature of about 85° C. to about 96° C., andpreferably about 94° C. The heated mixture is then taken to atemperature of about 45° C. to 55° C., and most preferably about 50° C.,over an extended period of time. Most suitably, the cooling takes placeover at least about 20 minutes, and preferably over at least 30 minutes.Thereafter, the mixture is maintained at about the same temperature,e.g., at about 50° C., for at least about 5 minutes. However, thistemperature may be maintained for a longer period of time such as anhour, several hours, or overnight, if required for convenience.

Following this incubation at 50° C., a thermostable polymerase is addedto the mixture. A suitable RNA or DNA polymerases may be readilyselected. See, discussion of polymerases in section relating to stage 1amplification. The mixture containing the products for amplification andthe polymerase (and exonucleases) are heated to about 55° C. to about75° C. for about 3 to about 20 minutes. Preferably, this heating isperformed at about 72° C. for about 5 minutes. The primer P1 for theupstream region and the primer P4 for the downstream region are thenadded to the mixture which is subjected to a standard 30 cycles of PCRwith an extension time appropriate for the expected full-length product.

The resulting full-length product is a modified nucleic acid moleculecontaining the upstream region fused to the cassette which is fused tothe downstream region. Optionally, the product is purified tohomogeneity prior to further amplification. Alternatively, the productmay be subjected to further amplification via PCR prior to purification.Thus, the method of the invention provides a modified nucleic acidmolecule containing the cassette in the target site which is flanked bythe upstream and downstream region.

Optionally, the plates or tubes containing the final product, i.e.,modified nucleic acid molecule, may be stored in the freezer (e.g., at−80° C.) while awaiting further testing. Where desired, the finalproduct is purified using any of a variety of suitable means, e.g.,agarose gel electrophoresis, and, optionally, a sample may be sequencedto confirm the identity of the product.

II. Two-stage PCR

In another embodiment, the method of the invention also permits one toproduce a modified nucleic acid molecule, which does not requireseparate generation of the upstream region/cassette andcassette/downstream region fusion products. In other words, stage 1 andstage 3 amplification are performed as described herein for thethree-stage method, but stage 2 amplification is eliminated. As with thethree-stage PCR embodiment of the invention, this method may beperformed in a multi-well plate, a tube, or in any other suitablereaction vessel.

In such an embodiment, the upstream region, cassette and downstreamregion may be produced as described in stage 1 above. Briefly, P1/P2 areused for the upstream region of the nucleic acid molecule and P3/P4 forthe downstream region of the nucleic acid, using PCR with a highfidelity polymerase possessing 3′-5′ exonuclease activity. These regionscontain sufficient homology to mediate homologous recombination in aparticular host cell. A fusion cassette, R is also amplified with R1 andR2. Reaction conditions are as specified by the enzyme supplier withextension supplier with extension times adjusted for the expectedproduct size. Each final product is purified to homogeneity.

In the final stage of this embodiment of the method of the invention,approximately equivalent amounts of the cassette, upstream region anddownstream regions produced are mixed, and amplified as described forstage 3 amplification, using P1 and P4 to PCR amplify the final product.Reaction conditions are as specified by the enzyme supplier withextension times adjusted for the expected product size. The product ispurified to homogeneity, and optionally, sequenced to confirm itsidentity.

III. The Modified Nucleic Acid Molecules

Thus, the three-stage and two-stage PCR methods of the invention may beutilized to construct modified nucleic acid molecules useful for avariety of purposes. These modified nucliec acid molecules may beintermediate products useful for subsequent molecular cloning of adesired construct. For example, a modified nucleic acid molecule of theinvention may be engineered to contain restriction sites which permitrapid insertion of a digestion fragment containing a desired portion ofthe modified nucleic acid molecule and the cassette into apre-determined location in a desired plasmid or viral vector, or thelike. In such an instance, the modified nucleic acid molecules areconstructed using primers containing the appropriate restriction sitesto facilitate this molecular cloning. Alternatively, the modifiednucleic acid molecules generated according to the invention mayrepresent a desired end-product, i.e., for testing or for therapeutic orvaccinal use.

A. Assay Formats

Suitably, the invention provides a method for generating modifiednucleic acid molecules which are suitable for constructing geneknockouts for in vitro or iii vivo testing of specific genes, andparticularly for testing whether such specific genes are essential for aparticular function. It may be desirable in these embodiments for thecassettte inserted to contain a reporter or marker gene, as definedabove. However, in other embodiments, the use of a gene encoding atherapeutic protein is desirable and assays are performed to determinethe effect of expression of the therapeutic protein on a selected hostcell.

In one embodiment, gene knockouts may be tested in vitro using a highthroughput assay format. Suitably, the modified nucleic acid moleculescontaining the disrupted gene are constructed according to thethree-stage or two-stage PCR method of the invention and contain acassette with a marker gene. Thereafter, a suitable host cell whichcontains a functional gene corresponding to the gene disrupted in themodified nucleic acid molecule of the invention is added to each of thewells. For example, if the modified nucleic acid molecule is a plasmidor linear fragment containing sequences from Streptococcus pneumoniaewith a functional deletion in a selected gene, one may add Strep.pneumioniae cells to wells containing the modified nucleic acidmolecules produced according to the present invention. The plates arethen incubated under conditions which promote transformation of thecells with the modified nucleic acid molecules. Most preferably, thecells used are “pre-competent” and are grown through the competent phasein the presence of the knock-out constructs. Optionally, the competentphase by be induced by competence stimulating peptide (CSP). Thereafter,the plates are checked for the presence or absence of cell growth.Transformation of the cells is confirmed by detection of the marker.Where the presence of the marker is detectable, the absence of cellgrowth is an indication that the selected gene functionally deleted fromthe knockout is essential for cell growth. Positive cell growthindicates that the functionally deleted gene is non-essential for cellgrowth. A similar assay format may be used to determine the impact of aforeign gene on a selected host cell, where the modified nucleic acidconstruct of the invention contains a cassette encoding a heterologousgene product. Alternatively, the modified nucleic acid molecules of theinvention may be used in vivo assays, many of which are known in theart. Selection of suitable in vitro and in vivo assays are not alimitation of the present invention.

In another embodiment, the method permits the rapid construction offusion molecules encoding therapeutic or antigenic proteins forexpression studies and the like. For example, this may provide a rapidmethod of generating vaccinal or therapeutic viral vectors, or modifiedbacterial vaccine candidates. In these embodiments, the cassette mayinclude a transgene under the direction of regulatory sequences whichdirect its expression in a host cell. Thus, the cassette may beengineered to contain a promoter, enhancer, transcription initiation ortermination sequences, efficient RNA processing signals such as splicingand polyadenylation signals (which may contain splice donor and acceptorsites), sequences that stabilize cytoplasmic mRNA, sequences thatenhance translation efficiency (i.e., Kozak consensus sequence),sequences that enhance protein stability and, when desired, sequencesthat enchance protein secretion, as well as other regulatory andexpression control sequences. In one embodiment, the method of theinvention may be used to place a chromosomal gene copy under the controlof a regulatable promoter, or to place a foreign gene controlled by aregulatable promoter in a non-essential site on the chromosome. Thus,promoters may be constitutitive or inducible or regulatable. Selectionof suitable promoters and other vector elements are conventional andmany such regulatory and expression control sequences are available[see, e.g., Sambrook et al, and references cited therein at, forexample, pages 3.18-3.26 and 16.17-16.27 and Ausubel et al., CurrentProtocols in Molecular Biology, John Wiley & Sons, New York, 1989].

B. Pharmaceutical Compositions

The modified nucleic acid molecules of the invention may be useful forin vitro, ex vivo, or in vivo delivery of a transgene to a selected hostcell. Alternatively, the modified nucleic acid molecules of theinvention may be useful in pharmaceutical compositions for ex vivo or invivo delivery of a transgene for therapeutical or vaccinal purposes.Such pharmaceutical compositions contain the modified nucleic acidmolecule produced according to the method of the invention formulatedwith a pharmaceutically acceptable carrier, such water, a salinesolution, a vegetable oil, or mixtures thereof. Other suitable carriersmay be readily selected by one of skill in the art and are not alimitation of the present invention. Still other components customarilyemployed in the preparation of pharmaceutical compositions may beadvantageously included, including, adjuvants, preserving agents,coloring agents, and the like.

Suitably, the molecules of the invention are combined with one or morepharmaceutically acceptable carriers, for examples, solvents, diluentsand the like, and are administered in the form of sterile injectablesolutions or suspensions containing the molecules in an isotonic medium.Generally, the modified nucleic acid molecules of the invention aredelivered in an amount of about 0.01 μg to 100 mg per kg body weight.The molecules may be suspended in a carrier, as identified above, anddelivered in doses of from about 1 mL to about 30 mL by any suitableroute, including, without limitation, intravenous, intramuscular,subcutaneous, and oral. The method of administration is not limited tothe delivery routes specified herein. It is within the skill of one inthe art to determine the appropriate dosage regimen, taking intoconsideration such factors as the condition to be treated, the age,weight, sex and condition of the patient, and the like.

The following examples demonstrate product of several modified nucleicacid molecules using the methods of the invention. These examples areillustrative only and are not a limitation of the present invention.

EXAMPLE 1 Two-piece PCR Method Used to Make an Erythromycin-resistantKnockout Cassette which when Transformed Into Streptococcus PneumoniaeDemonstrated fabH Essentially

S. pneumoniae gene identified as fabH, primers to the gene sequence weredesigned follows. The bold underlined regions are complementary to R1and R2 which in this experiment are designed to amplify up the ermAMerythromycin resistance gene; and the non-underlined regions arehomologous to DNA sequences in or flanking fabH:

P1 [SEQ ID NO: 1] 5′TAAGGGGCTACATTGACCGAGTTC 3′

P2 [SEQ ID NO: 2] 5′CCGCCATTCTTTGCTGTTTCGTTCCAGCTTGCCA TCAGTTTCT 3′

P3 [SEQ ID NO: 3] 5′GGAAAGTTACACGTTACTAAAGGCTGGGGCACGCT CATTCTTACA 3′

P4 [SEQ ID NO: 4] 5′TTTTCATAGTGCCTCCAACCTT3′

P5 [SEQ ID NO: 5] 5′CTTATTTTTACCCATGCCCTTGT3′

P6 [SEQ ID NO: 6] 5′CAGGCCATCCCTCCTrGGAAAATA 3′

R1 [SEQ ID NO: 7] 5′CGAAACAGCAAAGAATGGCGG 3′

R2 [SEQ ID NO: 8] 5′CCTTTAGTAACGTGTAACTTTC3′

The two-piece PCR reaction was performed using S. pneumoniae isolatedchromosomal DNA as template. In separate PCR reactions, P1/P2 were usedto produce the upstream region and P3/P4 were used to produce thedownstream region using PCR with Taq polymerase (AMPLITAQ®) and Pfuproof-reading polymerase. Reaction conditions were as specified by theenyzme supplier with extension times adjusted for the expected productsize. The cassette was produced using a similar PCR reaction. Each finalproduct was purified to homogeniety on an agarose gel column.

P1/P2 413 bp

P3/P4 437 bp

R1/R2 941 bp

The Stage II purified modified nucliec acid molecule consisting of afabH knockout construct was sequenced to confirm its identity and usedto transform S. pneumoniae R6 competent cells was transformed usingstandard techniques. Briefly, the DNA was incubated with pre-competentcells, which are allowed to grow to permit phenotypic expression of themarker, and transformants identified folowing growth under selectiveconditions. No colonies were obtained after 3 attempts, indicating thatthe fabH gene is essential in S. pneumoniae.

EXAMPLE 2 Three-piece PCR Method Used to Make an Erythromycin-resistantKnockout Cassette which when Transformed Into Streptococcus PneumoniaeDemonstrated fabH Essentiality

In order to knockout the S. pneumoniae gene identified as fabH, primersto the gene sequence were designed as follows. The bold underlinedregions are complementary to R1 and R2 which in this experiment aredesigned to amplify up the ermAM erythromycin fabH:

P1 [SEQ ID NO: 10] 5′TAAGGGGCTACATTGACCAGTTC 3′

P2 [SEQ ID NO: 11] 5′CCGCCATTCTTTGCTGTTTCGTTCCAGCTTTTGCC ATCAGTTC 3′

P3 [SEQ ID NO: 12]5′GGAAAGTTACACGTTACTAAAGGCTGGGGCACGC TCATTCTTAC3′

P4 [SEQ ID NO: 13] 5′TTTTCATAGTGCCTCCAACCTT3′

P5 [SEQ ID NO: 14] 5′CTTATTTTTACCCATGCCCTTGTA 3′

P6 [SEQ ID NO: 15] 5′CAGGCCATCCCTCCTTGGAAAATA 3′

R1 [SEQ ID NO: 16] 5′CGAAACAGCAAAGAATGGCGG 3′

R2[SEQ ID NO: 17]5′CCTTTAGTAACGTGTAACTTTCC 3′

The three-piece PCR reaction was set up using S. pneumoniae isolatedchromosomal DNA as template. The Stage I reactions were performed asdescribed in Example 1, using the primers of this example. The productsizes were determined by azarose gel electrophoresis:

P1/P2 413 bp

P3/P4 437 bp

R1/R2 941 bp

In Stage II, two separate PCRs were performed using Taq polymerase as inthe first stage reaction. In a first PCR, 0.1 μg of each of the productsfrom PCR of the cassette and the upstream region of S. pneumoniae weremixed and in a second PCR reaction, 0.1 μg of each of the products fromPCR of the cassette and the downstream region of S. pneumoniae weremixed. For the upstream reaction, primers P1 and R2 were used. For thedownstream reaction, primers R1 and P4 were used. The two resultingfusion products, i.e., upstream region/cassette and cassette/downstreamregion, were purified to homogeneity prior to Stage III.

Stage III was performed by mixing 0.5 μg of each the upstreamregion/cassette and cassette/downstream region in a standard Taqpolymerase PCR without polymerase or primers. The reaction was held for5 minutes at 94° C., and then taken to 50° C. over a ramp period of 30minutes. The reactions was then held at 50° C. for 5 minutes. Duringthis time, 2.5 U of Taq polymerase was added, and the reaction was takento 72° C. for an extension time of 5 minutes. After this period, P1 andP4 are added, and the reaction was subjected to a standard 30 cycle PCR.

The Stage III purified fabH knockout cassette was sequenced to confirmits identity and used to transform S. pneumoniae competent cells. Nocolonies were obtained after 3 attempts, indicating that the fabH geneis essential in S. pneumoniae.

Similar methods may be used to assay the function of non-essentialgenes. Where the gene is non-essential, mutant colonies will beobtained. Southern blot analysis and diagnostic PCR reactions can beused to assay the band sizes following agarose gel electrophoresis.

EXAMPLE 3 Three-stage PCR Method Used to Make a Knockout Cassette whichwas Cloned into a Staphylococcus aureus Plasmid for Essentiality Studies

In order to knockout the S. aureus gene identified as era, primers tothe gene sequence were designed as follows. The bold underlined regionsare complementary to R1 and R2 which in this experiment are designed toamplify the ermC erythromycin resistance gene; and the non-underlinedregions are homologous to DNA sequences in or flanking era. The lowercase bases represent thermal clamps (cgc) and recognition sites (ggatcc)for the restriction enzyme BamHI used for cloning purposes:

P1 [SEQ ID NO: 18] 5′cgcggatccTGTTGTAGATTTAGTGACCG 3′

P2 [SEQ ID NO: 19] 5′CGGGATACAAAGACATAATCTTCCCTACATTTGG TCTACC 3′

P3 [SEQ ID NO: 20] 5′GTAAGTTAAGGGATGCATAATGGTTATGTTGAAG ACCAAG3′

P4 [SEQ ID NO: 21] 5′cgcggatccTCAGCTTGTGTGTCATTACC 3′

P6 [SEQ ID NO: 22] 5′ATCTTTAGAAGCCTCTTGCC 3′

R1 [SEQ ID NO: 23] 5′GATTATGTCTTTGTATCCCG 3′

R2[SEQ ID NO: 24] 5′TTATGCATCCCTTAACTTAC 3′

The three-piece PCR reaction was set up as described in section B aboveusing S. aureus WCUH29_(c) isolated chromosomal DNA as template. TheStage I reactions produced products of the predicted sizes as determinedby agarose gel electrophoresis:

P1/P2 615 bp

P3/P4 530 bp

R1/R2 1234 bp

The Stage III purified knockout cassette was cloned intopBluescript-tetA at the BamHi site to produce pEra. pEra was introducedinto S. aureus RN4220 by electroporation. Colonies were obtained thatwere dual Em^(R) and Tc^(R), and represented plasmid insertioncointegrants at the era locus. Diagnostic PCR products were obtainedwith:

RIP6 1825 bp

Indicating that the plasmid had integrated into the chromosome using theright flank (P3/P4) of homology with era.

A φ11 bacteriophage lysate was prepared on the cointegrant strain, andthe resulting transducing phage were used to infect WCUH29_(c). Cloneswere selected that were Em^(R) and Tc^(S). These clones representedrecombination events involving repeated sequences of the right flankingregion generated during cointegrant formation, such that plasmidsequences were excised leaving behind an allelic exchange mutation ofera. The predicted structure of the allelic exchange was indicated bydiagnostic PCR with:

P1/P4 2379 bp

Since the ermC cassette introduced a new NsiI site into the era locus,the structure could be confirmed by Southern hybridization.

For genes that are essential for in vitro viability, Em^(R) and Tc^(S)clones would not be recovered.

All publications cited in this specification are incorporated herein byreference herein. While the invention has been described with referenceto a particularly preferred embodiment, it will be appreciated thatmodifications can be made without departing from the spirit of theinvention. Such modifications are intended to fall within the scope ofthe appended claims.

24 1 24 DNA Streptococcus pneumoniae 1 taaggggcta cattgaccga gttc 24 245 DNA Streptococcus pneumoniae 2 ccgccattct ttgctgtttc gttccagcttttgccatcag tttct 45 3 45 DNA Streptococcus pneumoniae 3 ggaaagttacacgttactaa aggctggggc acgctcattc ttaca 45 4 22 DNA Streptococcuspneumoniae 4 ttttcatagt gcctccaacc tt 22 5 23 DNA Streptococcuspneumoniae 5 cttattttta cccatgccct tgt 23 6 24 DNA Streptococcuspneumoniae 6 caggccatcc ctccttggaa aata 24 7 21 DNA Streptococcuspneumoniae 7 cgaaacagca aagaatggcg g 21 8 22 DNA Streptococcuspneumoniae 8 cctttagtaa cgtgtaactt tc 22 9 23 DNA Streptococcuspneumoniae 9 taaggggcta cattgaccag ttc 23 10 44 DNA Streptococcuspneumoniae 10 ccgccattct ttgctgtttc gttccagctt ttgccatcag tttc 44 11 44DNA Streptococcus pneumoniae 11 ggaaagttac acgttactaa aggctggggcacgctcattc ttac 44 12 22 DNA Streptococcus pneumoniae 12 ttttcatagtgcctccaacc tt 22 13 24 DNA Streptococcus pneumoniae 13 cttatttttacccatgccct tgta 24 14 24 DNA Streptococcus pneumoniae 14 caggccatccctccttggaa aata 24 15 21 DNA Streptococcus pneumoniae 15 cgaaacagcaaagaatggcg g 21 16 23 DNA Streptococcus pneumoniae 16 cctttagtaacgtgtaactt tcc 23 17 29 DNA Streptococcus aureus 17 cgcggatcctgttgtagatt tagtgaccg 29 18 40 DNA Streptococcus aureus 18 cgggatacaaagacataatc ttccctacat ttggtctacc 40 19 40 DNA Streptococcus aureus 19gtaagttaag ggatgcataa tggttatgtt gaagaccaag 40 20 29 DNA Streptococcusaureus 20 cgcggatcct cagcttgtgt gtcattacc 29 21 20 DNA Streptococcusaureus 21 atctttagaa gcctcttgcc 20 22 20 DNA Streptococcus aureus 22atctttagaa gcctcttgcc 20 23 20 DNA Streptococcus aureus 23 gattatgtctttgtatcccg 20 24 20 DNA Streptococcus aureus 24 ttatgcatcc cttaacttac 20

What is claimed is:
 1. A method for inserting a cassette into a DNAmolecule to produce a DNA sequence fusion cassette without requiringligation, said method comprising the steps of: (a) providing a selectedDNA molecule comprising a first region of DNA sequences upstream of asite targeted for disruption and a second region of DNA sequencesdownstream of the site targeted for disruption, wherein said firstregion of DNA sequences upstream of a site targeted for disruptionfurther comprises a first strand having a first and a second end, andsaid second region of DNA sequences downstream of the site targeted fordisruption further comprises a first strand having a first and a secondend; (b) providing a cassette comprising a first strand of DNA, whereinthe first strand of said cassette comprises at its 5′ end DNA sequenceswhich overlap with sequences at the second end of the first region ofDNA sequences upstream of a site targeted for disruption, and at its 3′end DNA sequences which overlap with sequences of the first end of thesecond region of DNA sequences downstream of the site targeted fordisruption; (c) amplifying the first region of DNA sequences upstream ofa site targeted for disruption using primers for said first region andamplifying the second region of DNA sequences downstream of the sitetargeted for disruption using primers for said second region therebyproducing amplified first and second regions; (d) mixing the cassettewith the amplified first and second regions thereby producing a mixture;(e) amplifying the mixture of (d) using polymerase chain reaction,thereby producing without ligation a DNA sequence fusion cassettecomprising the first region of DNA sequences and second region of DNAsequences flanking the cassette, wherein said amplifying furthercomprises the steps of: (i) heating the mixture of (d) for about 5minutes in the absence of polymerase or primers at about 94° C.; (ii)cooling the mixture of (i) to 50° C. over about 30 minutes; (iii)maintaining the mixture of (ii) at about 50° C. for about 5 minutes;(iv) adding a thermostable polymerase to the mixture of (iii); (v)adding a proof-reading polymerase with 3′ exonuclease activity to themixture of (iv); (vi) heating the mixture of (v) to about 72° C. forabout 5 minutes; and (vii) adding to the mixture of (vi) primerscomprising a 5′ forward primer P1 complementary to said first region anda 3′ reverse primer P4 complementary to said second region.
 2. A methodfor inserting a cassette into a DNA molecule to produce a DNA sequencefusion cassette without requiring ligation, said method comprising thesteps of: (a) providing a first region of DNA sequences and a secondregion of DNA sequences, said first and second regions each comprising afirst strand having a first and second end; (b) mixing with the firstand second regions a cassette comprising a first strand of DNA, whereinthe first strand of said cassette comprises at its 5′ end DNA sequenceswhich overlap with sequences at the second end of the first region, andat its 3′ end DNA sequences which overlap with sequences of the firstend of the second region thereby producing a mixture; (c) heating themixture of (b) for about 5 minutes in the absence of polymerase orprimers at about 94° C.; (d) cooling the mixture of (c) to 50° C. overabout 30 minutes; (e) maintaining the mixture of (d) at about 50° C. forabout 5 minutes; (f) adding a thermostable polymerase to the mixture of(e); (g) adding a proof-reading polymerase with 3′ exonuclease activityto the mixture of (f); (h) heating the mixture of (g) to about 72° C.for about 5 minutes; (i) adding to the mixture of (h) primers comprisinga 5′ forward primer P1 complementary to the first region and a 3′reverse primer P4 complementary to the second region, and (j) amplifyingthe mixture of (i) using polymerase chain reaction, thereby producingwithout ligation a DNA sequence fusion cassette comprising the firstregion of DNA sequence and second region of DNA sequences flanking thecassette.